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Abstract 

Predictions of the uncertainty associated with extreme events are a vital component of any 
prediction system for such events. Consequently, the prediction system ought to be proba- 
bilistic in nature, with the predictions taking the form of probability distributions. This paper 
concerns probabilistic prediction systems where the data is assumed to follow either a gen- 
eralized extreme value distribution (GEV) or a generalized Pareto distribution (GPD). In this 
setting, the properties of proper scoring rules which facilitate the assessment of the prediction 
uncertainty are investigated and closed-from expressions for the continuous ranked probability 
score (CRPS) are provided. In an application to peak wind prediction, the predictive perfor- 
mance of a GEV model under maximum likelihood estimation, optimum score estimation with 
the CRPS, and a Bayesian framework are compared. The Bayesian inference yields the high- 
est overall prediction skill and is shown to be a valuable tool for covariate selection, while 
the predictions obtained under optimum CRPS estimation are the sharpest and give the best 
performance for high thresholds and quantiles. 

Keywords: Bayesian variable selection, continuous ranked probability score, extreme events, 
optimum score estimation, prediction uncertainty, wind gusts 



1 Introduction 

Extreme events in weather and climate such as high wind speeds, heavy precipitation or extremal 
temperatures are commonly associated with high impacts on both environment and society. How- 
ever, the physical processes leading to the extremes are usually generated on small scales and their 
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prediction is contaminated by large uncertainty. The need to determine uncertainties in the predic- 
tions of extreme events is stressed in the report from a recent workshop on extreme events in cli- 
mate and weather ( |Guttorp and Fuentes , 2010[ ). A prediction system for such events should there- 



fore be probabilistic in nature, allowing for an assessment of the associated uncertainty (Dawid 
T984j |Gneitingl|2008T ). 



The verification methods applied to these systems should thus necessarily be equipped to also 
handle the verification of uncertainty estimates. [Murphy ([1993) argues that a general prediction 
system should strive to perform well on three types of goodness: there should be consistency 
between the forecaster's judgment and the forecast, there should be correspondence between the 
forecast and the observation, and the forecast should be informative for the user. On a similar note, 
Gneiting et al. ( 2007[ ) state that the goal of probabilistic forecasting should be to maximize the 



sharpness of the predictive distribution subject to calibration. Here, calibration refers to the sta- 
tistical consistency between the predictive distribution and the observation, while sharpness refers 
to the concentration of the predictive distribution; the sharper the forecast, the higher information 
value will it provide. The prediction goal of Gneiting et al. (2007) is thus equivalent to Murphy's 
second and third type of goodness. 

Verification methods that aim to attain these goals have been extensively studied in the lit- 
erature, see e.g. Wilks ( 2011[ Chapter 8) for an excellent overview. In this paper, we focus on 



the prediction of extreme events and our main objective is to assess the characteristics of proper 
scoring rules for extreme value distributions. The framework of proper scoring rules can also be 
used for the parameter estimation in that a scoring rule is optimized over the training data, see e.g. 
Gneiting et al. ( 2005[ ). Optimum score estimation returns unbiased parameter estimates (Dawid, 



2007|) and for the ignorance score, this equals maximum likelihood estimation for independent 



observations. The continuous ranked probability score or CRPS (Unger[ 1985 ; Hersbach] 2000 



Gneiting and Raftery} 2007 [ ) is of particular interest in our context, as it simultaneously assesses 



all of Murphy's types of goodness. A closed form expression of the CRPS has been calculated for 



a normal distribution (Gneiting et al. 2005), for a mixture of normals (Grimit et al. 2006), for a 



truncated normal distribution ( |Gneiting et al.[ |2006[ ), and for the three parameter two-piece nor- 
mal distribution (Thorarinsd ottir and Gneiting! 2010[ ). We derive closed-form expressions for the 
CRPS for the generalized extreme value distribution (GEV) and the generalized Pareto distribution 
(GDP). 

In an application to peak wind prediction, we compare minimum CRPS estimation, maximum 
likelihood estimation, and a Bayesian approach under the GEV using predictive performance as the 
comparison criteria. Peak winds of short duration (few seconds) are a major cause of wind-related 
damage and crucial in wind hazard studies. Observations of peak winds are, however, sparse since 
only a small proportion of the weather stations provide peak wind speed observations. Using data 
from the observation station Valkenburg in the Netherlands, we investigate how observations of 
other weather variables, including mean wind speed, precipitation, and pressure, may be used as 
covariates to obtain a predictive distribution for the peak wind speed. Furthermore, we show how a 
Bayesian regression variable selection method can simplify the covariate selection procedure when 
the space of potential models is large. 

The remainder of the paper is organized as follows. In section[2j we discuss how the predictive 
performance of probabilistic forecasts for extreme events may be assessed and provide the closed 
form expressions for the CRPS under the GEV and the GPD. A detailed description of the deriva- 
tion is given in the appendix. Our case study on probabilistic forecasting and forecast verification 
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for peak wind speed is presented in section |3j The paper closes with a discussion in section |4} 



2 Assessing predictive performance 



Murphy and Winkler]( 1987 ) and[Murphy ( 1993] ) define a general framework for deterministic fore- 



cast verification based on two aspects of prediction quality: reliability and resolution. Reliability, 
or calibration, relates to the correspondence between the observation and the forecast. In our prob- 
abilistic setting, a forecast F is perfectly calibrated if the conditional distribution of the observation 
Y given the forecast is equal to F. Resolution is a measure of information content which is closely 
related to the sharpness of the forecast. In our setting, sharpness is defined with respect to the 
predictive distribution, where a sharp predictive distribution reflects high information content or 
entropy. Note that sharpness has to be conditioned on F being calibrated for it to be related to the 
resolution - otherwise it is merely an attribute of the forecast distribution. In the following, we dis- 
cuss various methods to assess the calibration, or the reliability, and the resolution of a predictive 
distribution for data assumed to follow an extreme value distribution as in @ or ([8]) below. 

2.1 Calibration 

Let Fi, . . . , F n denote the respective predictive distributions for the observations y 1 , . . . , y n . A 
standard method to assess the calibration of the forecasts is to apply the probability integral trans- 
form (PIT) ( jDawid , |1984 ). The PIT is given by the value of the predictive distribution at the 



observation, Fi(yi). If the forecasts are calibrated, the sample {Fj(?/j)}™ =1 should follow the stan- 
dard uniform distribution. This may e.g. be assessed graphically through a histogram, where a U- 
shaped histogram will indicate that the forecasts are underdispersive, while a fl-shaped histogram 
will indicate that the forecasts are overdispersive, see e.g. Wilks (2011 1. The PIT histogram is 



the continuous counterpart of the verification rank histogram or the Talagrand diagram (Anderson 
19961 IHamill and Coluccij[T997l ). 



The PIT values might also be displayed in terms of a PP-plot, where they are compared to n 
equally spaced percentiles of the standard uniform distribution. In extreme value theory, a very 
popular assessment of the calibration of a non-stationary extreme value model is the so-called 



residual quantile plot (Coles 2001). Residual quantile plots emphasize potential deviations in 
the upper tail of the distribution. To this end, the PIT values and the uniform percentiles are 
transformed to quantiles through some inverse distribution function G^ 1 . In general, the inverse 
Gumbel is used for G~ l , as it is the standard reference distribution for models of block maxima. 
For threshold models an exponential distribution is often used. 

The calibration might also be estimated over a restricted range of forecasts, for instance sit- 
uations with high forecast probability of extreme events. A stratified PIT histogram or residual 
quantile plot would then be evaluated based on {i^(2/i)}ie/ for some I C {1, . . . , n}. An impor- 
tant aspect here is that the subset / has to be chosen independently of the observations y%, . . . , y n , 
as the observed value is not known to the forecaster when a forecast is issued. 
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2.2 Scoring rules 



A variety of scores exist to assess the quality of a probabilistic forecast. An important characteristic 
of a qualified scoring rule is its propriety (Murphy, |1973| |Gneiting and Raftery} |2007^ |Brocker 
and Smith] 2007 ). A scoring rule is (strictly) proper if the expected score for an observation Y 
is optimized if (and only if) the true distribution of Y is issued as the forecast. Propriety will 
encourage honesty and prevent hedging which coincides with Murphy's first type of goodness 
( |Murphy[ 1993). We will only consider proper scoring rules in the following. Furthermore, the 
scoring rules are negatively oriented such that a lower value means a better score. 



A widely used scoring rule is the logarithmic score proposed by Good ( 1952) which in meteo 



rological applications is known under the name ignorance score (Roulston and Smith 2002). It is 
defined as 



SiGN{f,y) = - k>g{f(y)dy). 

The ignorance score applies to the predictive density function / and is proportional to the log- 
likelihood of the data with respect to the predictive density. Note, that dy is ignored when cal- 
culating the log-likelihood. However, it must be taken into account if transformed variables are 
compared. The associated expected score is the Shannon entropy, and the divergence function 
becomes the Kullback-Leibler divergence (|Gneiting and Raftery[|2007"]). It thus represents a score 



which is motivated by information theory. Both |Selten| (|1998 ) and |Grimit et al.| ( [20 06) have pointed 
out that, although the ignorance score is simple to calculate, it attributes a very strong penalty to 
events with low probability and is thus very sensitive to outliers. 

The continuous ranked probability score (CRPS) assesses the predictive skill of a forecast in 
terms of the entire predictive distribution F (Ungerj 1985; Hersbachj 2000) and can thus assess 
both the calibration and the sharpness of the forecast simultaneously. It is defined as 



/oo 
[F(t) - H(t - 
-oo 



dt 



(1) 



and compares the forecast distribution F and the empirical distribution of the observation y. Here, 
H{t—y) denotes the Heaviside step function using the half-maximum convention with H(0) = 0.5. 
Note that an observation error might easily be introduced in the CRPS, e.g. assuming Gaussian 
errors and using a Gaussian distribution instead of the Heaviside step function. 

The CRPS can be decomposed into a reliability and a resolution part ( |Hersbach , 2000 [Gneiting 
and Raftery} |2007| ). That is, we can write 



1 



S CRP (F, y) = E|X -y\- -K\X - X' 



(2) 



where X and X' are independent random variables with distribution F. In principle, any (strictly) 
proper scoring rules may be decomposed into an uncertainty, a reliability and, a resolution part 
( |Gneiting and Raftery} 2007, Brocker[ 2009). This representation of the CRPS is especially useful 
if a closed form expression of the CRPS is not available for F. Let x and x' denote two independent 
samples of size m from the predictive distribution F. The representation in ([2]) can then easily be 
approximated by 
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m ^ m 

S C rp(F, y) « |xi - y| - - ^ 



(3) 



Even though the forecast is given by a full predictive distribution F, it is often of interest to 
focus on the prediction of certain events, such as the probability of threshold exceedance. We can 
assess the forecasters ability to predict events over a given threshold u with the Brier score, 



S u B (F,y) = ( Pu ~l{y>u}f 



(4) 



where p u = 1 — F(u) is the predicted probability of the realized value being greater or equal to the 
threshold u (Brier} 1950). Note that the CRPS in ([T]) represents an integral of the Brier score over 
all possible thresholds. 

Similarly, we might want to focus on the predictive performance in the upper tail. This can be 
achieved by using the quantile score 



ST Q (F,y)=p T (y-F- 1 (T)) J 

where y is the event that materializes, r is the quantile of interest, p T 
p T (u) = (r — l)it otherwise (Gneiting and Raftery, 2007; Friederichs and Hense, 2007). A third 
alternative deviation of the CRPS is obtained using the quantile score, 



(5) 

Tii if it > 0, and 



S CR p(F,y) = 2 / p T {y - F-\r))dr 



(6) 



see 



Laio and Tamea| ( |2~007| ), |Gneiting and Raftery]( |2007[ ), and |Gneiting an d Ranjan|( |2011[ ). 



When comparing various forecasters, it might often be advantageous to compare skill scores 
rather than the scores themselves (Murphy} [1973} 1974). A skill score measures the relative gain 
of a forecast with respect to the reference forecast and it is defined as 



SS(F,y) 



S(F,y)-S(F ref ,y) 
S(F per f ect ,y) — S(F re f,y) 



where F Te f is a reference forecast used for all forecasters, and F per f ect is the perfect forecast. In 
the case of CRPS and ignorance score the respective score of a perfect forecast is zero. A zero 
skill score represents no gain in predictive skill, while a perfect forecast would have a skill score of 
100%. This approach is especially useful when comparing very high quantiles under the quantile 
score in ([5]) or very high thresholds under the Brier score in (|4]) as these will be based on relatively 
small amount of data. 



2.3 Scoring rules for extreme value distributions 

Events are generally classified as being extreme by one of two criteria: the extremes are either 
given by a block maxima or they are defined as all values that exceed a given threshold. The 
asymptotic behavior of the first type can be modeled by the generalized extreme value distribution 
(GEV), 



5 



1 GEV 



(y) 



exp(_(i + ^0 
exp(-exp(-^)), ^ = 



(7) 



where 1 + £^ > for ^ dColes 



2001 



Beirlant et al. 



2004). The three parameters of the 



GEV are location /i, scale a and shape £. For threshold excesses, the generalized Pareto distribution 
(GPD) is usually applied. It is given by 



GPD 



(y) 



exp(-S 



where u is a sufficiently large threshold, a u is the scale and £ the shape parameter. 
For £ 7^ 0, a closed form expression of the CRPS for the GEV is given by 



(8) 



Scrp[Fgev^oi 



y) 



a 



1 - 2F t 



GEVc 



5/0 



(y) 



2«r(l - - 2r z (1 - £, - log iW^ (y)) 



(9) 



where T denotes the gamma function and T/ the lower incomplete gamma function. For £ = 0, this 
expression reads 

S C Rp{F GE v i=0 ,y) =fi~y + ci[C- log 2] - 2<7#i(logF GJ ^ e=0 (y)), (10) 

where C ~ 0.5772 is the Euler-Mascheroni constant and Ei{x) = jdt is the exponential 
integral ( Abramowitz and Stegun[ 1964). The derivations are given in Appendix A. In the case of 
the GPD, we get 



S G Rp(F G PD^ 



y) 



u-y- 
2a, 



a 



1 - 2F< 
1 

+ 



GPD, 



(y) 



GPD^jt 



(y) 



for £ 7^ and 



S G Rp{FGPD £=0 ,y) =y-u-a v 



2F< 



GPD, 



Jy)- 



y-u 



(11) 



(12) 



for £ = 0. The derivations for a GPD are given in Appendix B. 

Figure [T] shows the CRPS, the ignorance score and the quantile score for r = 0.5 for the GEV 
with shape parameter £ = —0.5 (Weibull type), £ = (Gumbel type), and £ = 0.5 (Frechet 
type), as well as for the GDP with the same parameter values. The ignorance score has a sharper 
minimum than the other scores and takes its minimum at the mode of the predictive distribution 
while the other two scores take their minimum at the median of the distribution. For the CRPS 
under the GEV, this can easily be shown by introducing the median, Faey(0.5) = yU+|((log2)~^ — 
1) for £ 7^ and Fgev(0-5) = — oTog(log2) for £ = 0, into the derivative of the score with 
respect to y. 
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Figure 1: The CRPS (solid lines), the ignorance score (dashed lines), and the quantile score for r = 0.5 
(dotted lines) as a function of the observation value for the GEV (top row) and the GPD (bottom row). The 
predictive distributions have zero location (/i = 0), standard scale (a = 1), and shape £ = —0.5 (first 
column), £ = (second column), or £ = 0.5 (third column). The corresponding predictive densities are 
indicated in gray. 



3 Predicting peak wind speed 

We apply the verification measures discussed in the previous sections to predictions of daily peak 
wind speed at a location in the Netherlands over a period from January 1, 2001 to July 1, 2009. 
The goal is to derive a prediction model for daily peak wind speed using the observations of other 
meteorological variables as covariates with out-of-sample data used for the parameter estimation. 
In our application, the covariates are observed at the same time and location as the observation. 
We thus derive nowcasts rather than real forecasts in time for daily peak wind speed. In view of the 
sparse observational network for gust observations, such an analysis can be very useful, e.g. for 
forecast verification of wind gust warnings ( jFriederichs et al.[ |2009| ). Furthermore, the covariates 
might easily be replaced by the corresponding outputs from numerical weather prediction (NWP) 
models, thereby providing a probabilistic methodology for model output statistics, as NWP models 
only give diagnostic estimates for peak wind speed. 

For the modeling, we use generalized extreme value distributions under both a Bayesian fore- 
casting framework and a frequentist framework where the parameters are estimated by minimizing 
a proper scoring rule. That is, we assume that peak wind speed observations follow a non- stationary 
GEV ([7]) where the parameters depend on covariates x through functions of the form 

ju(x) = /J® + fnxi + ji 2 x 2 + . . . , h(cr(x)) = cr + a lXl + a 2 x 2 + . . . , £ = Co, (13) 



where h(-) is a link function in analogy to generalized linear models (Fahrmeir and Tutz 2001 1 
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The shape parameter is very sensitive to sampling uncertainty. Including covariates to the shape 
parameter largely increases the uncertainty not only of the shape parameters estimate itself, but 
also of the other parameters (Fried enchs et al.[ 2009| ). The benefit in turn is generally small. 



For this reason, the shape parameter is kept independent of covariates. However, in a forecasting 
procedure, alike the other parameters, the shape parameter is estimated on a training data set and 
used to provide out-of-sample prediction. This out-of-sample prediction and verification is realized 
by a cross-validation procedure (see section 3.4| ). 

As an alternative forecast, we apply a standard approach in meteorology and wind engineering, 



a generalized linear model (GLM, e.g., jMcCulla gh and Neldeq , |T999 ) for the gust factor. The gust 
factor G is defined as 



G 



Vfx 

Xff 



1. 



(14) 



where Xff denotes the mean wind speed observation and yf x the peak wind speed observation. 
The gust factor is generally assumed to follow a lognormal distribution, while the most simple 
approaches assume a constant gust factor. We base our approach on the work of Weggel (1999) 
and Jungo et al. (2002) and use a GLM for lognormal residuals. This corresponds to a standard 
linear regression model for log G with the expected value being modeled as 



E[log G] = p + ftaci + /3 2 x 2 + 



(15) 



The GLM is estimated using iteratively reweighted least squares as provided by the glm function 
in R (|R Development Core Team[ |2010|). 



3.1 Data 

The Royal Netherlands Meteorological Institute freely provides daily weather data from numerous 
observation locations in the Netherlands^ The meteorological station data contain observations 
of temperature, sunshine, cloud cover and visibility, air pressure, precipitation, mean wind, and 
peak wind speed for the specified station. We have chosen the station number 210, Valkenburg, 
with hourly observations from January 1, 2001 to July 1, 2009 which gives us a total of 3104 
daily observations. The daily observed peak wind yf x is measured as a 24-hour (00UTC-00UTC) 
maximum over the observed hourly wind gusts. The gust observations are measured in 1 m/s which 
makes the data quasi discrete. In order to obtain a more continuous spectrum of values, we added 
a uniformly distributed noise to the gust observations. The hourly mean wind speed is given by 
the mean wind speed during the 10-minute period preceding the time of observation. We process 
these data to daily mean wind x^-and daily wind variance ^'f/Var by taking the average and the 
variance, respectively, over the 24 hourly observations as above. A similar procedure yields the 
mean rain rate xrr and the maximum rain rate x r Max over ^4 hours. Finally, we also consider 
the mean 24-hour pressure xp and the 24-hour pressure tendency x^p as potential covariates. The 
pressure tendency x^p is estimated by fitting a linear trend function (i.e., linear function in time) 
to the 24 hourly pressure observations. The slope estimate yields an estimate of x^p. Note that the 
covariates are normalized before entering as a covariate. Normalization is not necessary but makes 
the inference procedures more stable. 

'http : / / www . knmi . nl/ c 1 imat o 1 ogy / da ily_dat a /download . html 
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Figure 2: The goodness of fit of a seasonal GEV model applied to 24h peak wind speed observations at 
Valkenburg, the Netherlands from January 1, 2001 to July 1, 2009 measured by a QQ-plot (a) and a PP-plot 
(b). The dark gray line represents the seasonal GEV model estimates, the light gray lines indicate the 95% 
confidence interval derived by parametric bootstrapping, and the black dots denote the empirical estimates. 



Peak wind speed refers to the maximum wind speed in a given time period. In operational 
terms, this refers to the highest 3-second average value recorded in a particular time period (e.g. 
1 hour). Consequently, a block maxima approach using a GEV represents a natural approach 
to model the peak wind speed observations. Figure [2] represents a goodness of fit for a GEV 
model fitted to the daily peak wind speed observations in our data set using maximum likelihood 
estimation. In order to account for seasonal variations, we included the annual cycle in terms of a 
annual sine and cosine function as covariate to the location and scale parameter. Both parameters 
exhibit significant annual changes. The relation of the shape parameter to the annual cycle is 
not significantly different from zero, neither any relation to a half-annual sine or cosine function. 
Although significant, the annual cycle accounts for only about 5% of the variance in the daily 
peak wind speed observations. The seasonally varying GEV exhibits a shape parameter of about 
£ = —0.022 with a standard error of 0.014 and provides an overall acceptable fit even though it 
seems to slightly overestimate high quantiles, see Figure [2fa). 

3.2 Optimum score estimation 

In our frequentist approach, we apply both classic maximum likelihood estimation (GEV-MLE) 
and minimum CRPS estimation (GEV-CRPS) for the parameter estimation of the predictive distri- 
bution function. For the minimum CRPS estimation, we minimize the function 

J2 ScRp ( Fi > y fxJ 

i 

where the index runs over the training set, Scrp is given by either ([9]) or ( fl~0| ) depending on the 
value of the shape parameter £, and F: L is the predictive distribution for a covariate vector Xj. The 
predictive distribution for Y is a GEV with 

P(Y < y\x) = F GEV (y;n(x),(T(x),Z), (16) 

where the parameters /i(x), cr(x), and £ are given by < [T3| ). Here, we apply both the identity and the 
logarithmic link function for the scale parameter a. The latter ensures a positive scale parameter 
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Figure 3: The in-sample ignorance score for the stationary GEV over the full data set. The plots show 
changes in the score when the location fiQ, scale uq, and shape £o are varied pairwise while the remaining 
parameter is fixed at the MLE. The black dots indicate the MLE and the white dots the minimum CRPS 
estimate. 




Figure 4: The in-sample CRPS for the stationary GEV over the full data set. The plots show changes in 
the score when the location hq, scale ao, and shape £o are varied pairwise while the remaining parameter is 
fixed at the minimum CRPS estimate. The white dots indicate the minimum CRPS estimates and the black 
dots the MLE. 



and guarantees valid predictions. 

Numerical optimization was first performed using the quasi-Newton Broyden-Fletcher-Goldfarb- 



Shanno (BFGS) and the Nelder-Mead methods implemented in R ( |R Development Core Team 



2010 ) in parallel and the best estimates in terms of optimum score were used. However, as we 



included more covariates, the optimization became less stable. For that reason, all optimum score 
estimates are derived using a Simulated Annealing algorithm (SANN). The SANN method in R 
by default uses a variant of Simulated Annealing given in Belisle (1992) and is useful for getting 
good estimates on rough surfaces. Although the SANN algorithm is slower than the BFGS and the 
Nelder-Mead, it provides more robust estimates. 

In order to compare the maximum likelihood and the minimum CRPS estimation, we investi- 
gate two-dimensional surfaces of the in- sample ignorance score and the CRPS over the entire data 
set. The link function for the scale parameter a is the identity. Figure [3] shows two-dimensional 
surfaces of the log-likelihood function, or the ignorance score, for pairwise combinations of fx , 
do, and £o- In eacn plot, the other parameter is fixed at its respective maximum likelihood estimate 
(MLE). The log-likelihood function is displayed over a range that covers the parameter estimate 
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plus-minus three standard errors as derived from the profile likelihood ( Coles [ |2001 ). Similar 
two-dimensional surfaces are displayed for the CRPS in Figure |4} where in each plot, the other pa- 
rameter is fixed at its respective minimum CRPS estimates. The CRPS is displayed over the same 
parameter range as the log-likelihood function in Figure [3j The MLE and the minimum CRPS 
estimates differ only slightly, or by less than two standard errors for all parameters (cf. the black 
and white dots in Figure [3] and [4]). The pairwise dependence of changes in the parameter values on 
the function surfaces is quite different for the two functions. It is particularly strong for the CRPS 
where the dependence is negatively oriented. Further, the CRPS function is less sharp than the 
log-likelihood function in the direction of £o- 



3.3 Bayesian forecasting framework 



Let l(yf x \0, X) denote the likelihood function under the GEV model in (|7j), where y^ x are the 
observed peak winds over the training set, X is the matrix of corresponding covariates, and is 



a vector of the model parameters in ( 13 ) with a logarithmic link function for the scale parameter. 
Given a prior distribution f{6) for 6, the joint posterior distribution of the parameters given the 
data is 



/(% fx ,x)ocZ(y fx |0,x)/(0). 

The predictive density for a new observation y' with a covariate vector x' is given by 



/(vK y& , x) = J i(y'\e,^)f(e\ yfx ,x)de. 



(17) 



While this density is generally not a GEV density, it has the advantage that it reflects uncertainty 



both in the model and in the future observations, see e.g. |Coles| ( |2001| ). For the GEV model, the 
integral in ( fTTj ) is usually intractable. However, the density may easily be approximated by a large 
sample from the predictive distribution, see Hoff| (2009 > section 4.3). 

We use non-informative independent normal priors for the parameters and set <jj ~ N(0, 10 4 
for all j, and £ ~ ^(0, 10 2 ). For stationary GEV distributions, it is common to use the logarith- 
mic link function for the scale parameter, see e.g. |Stephenson and Tawn|(|2~004) and |Galiatsatou 



et al.| ( [2008| ). We follow their practice and only use the logarithmic link function for this part of the 
analysis. Our priors correspond to setting /i , log(cr ) ~ N(Q : 10 4 ) and £ ~ N(0, 10 2 ) in the sta- 
tionary case. |Stephenson and Tawn] ( |2004[ ) and |Galiatsatou et aL] ( |2008[ ) argue that, in the stationary 



case, a trivariate normal distribution would be preferred here, as a negative dependence between 
do and £o is a priori expected. As our inference is based on a relatively large data set, we refrain 
from this given the computational complexity that such a prior would induce in the non- stationary 
case. We then apply a Metropolis within Gibbs algorithm to obtain samples from the marginal 
posterior distributions using normal distributions with a small variance as proposal distributions 
for the updates. For each parameter estimation, we run 100.000 iterations of the Metropolis within 
Gibbs algorithm with a burn-in period of 25.000 iterations. Classical diagnostics such as trace- 
plots and running mean plots (not shown) show good mixing and indicate that this is sufficient for 
convergence. 

In this case study, we consider six possible covariates and each of these can influence both 
the location and the scale parameter. We thus have a total of 2 12 possible models in our model 
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Table 1: Average posterior inclusion probabilities for the covariates in a Bayesian model selection frame- 
work over all years. The probabilities are given in percentages. 



Variable 


Location 


Scale 


Mean wind speed (if) 


100.0 


100.0 


Wind variance {fFVar) 


100.0 


100.0 


Mean rain rate (rr) 


0.1 


0.0 


Maximum rain rate (rMax) 


100.0 


100.0 


Pressure (P) 


100.0 


10.8 


Pressure tendency (dP) 


100.0 


0.0 



space. Besides considering specific models, we also perform a variable selection procedure over 
the entire model space. That is, we write each regression coefficient fj,j for j > 1 as fij = ZjVj, 
where Zj G {0, 1} and Uj G E such that the Zj's indicate which regression coefficients are non-zero. 
The regression equation for jj, becomes 



/x(x) =ij,o + z 1 v 1 Xff+ z 2 V2XffVar + z ^ x rr + ^^rMax + Z ^ X P + z 6V 6 x dP , 

and similarly for log(cr(x)). Each model indication parameter Zj is a priori equal to either or 
1 with probability 1/2, while the regression parameters Vj have independent N(0, 10 4 ) priors as 
before. Again, the parameters are updated iteratively using a Metropolis within Gibbs updating 
scheme. 

Our algorithm is equivalent to a reversible jump MCMC algorithm ( Green[ [T995 ) and the pos- 
terior inclusion probability of a covariate x equals the average value of the corresponding indicator 
variable z over the posterior sample. A related Bayesian inference framework is proposed in |EI| 
Adlouni and Ouarda ( |2009[ ) where the authors suggest a birth-death MCMC procedure for covari- 
ate selection in generalized extreme value models and apply their framework to annual maximum 
precipitation data. However, the birth-death MCMC algorithm is quite complex to implement 
compared to the fairly simple regression variable selection method described in Hoff ( 2009| ) which 
we have applied. For more details on the regression variable selection algorithm and Bayesian 
inference in general, see Hoff|( |200"9| ). 



3.4 Model selection 

In order to assess the predictive performance of our prediction approaches, we pursue a common 
approach in atmospheric science and use an out-of-sample verification. The prediction is per- 
formed in a cross-validation manner, in that we iteratively leave out one year of data, estimate 
the parameters of the statistical models based on the remaining data, and then perform an out-of- 
sample prediction. In this way, independent predictions for the complete time series are generated 
and used for verification and model selection. 

Average posterior inclusion probabilities for the covariates under the Bayesian regression vari- 
able selection framework are given in Table [TJ Three covariates, mean wind speed (if), wind 
variance (ffVar), and maximum rain rate (rMax) have very high inclusion probabilities for both the 
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Table 2: Average predictive performance of non- stationary peak wind speed predictions at Valkenburg from 
January 1, 2001 to July 1, 2009. The performance is measured in terms of the continuous ranked probability 
score (CRPS), which is given in meters per second, and the ignorance score (IGN). The covariates considered 
here are mean wind speed ff, wind variance ffVar, mean rain rate rr, maximum rain rate rMax, pressure P, 
and pressure tendency dP. VS stands for Bayesian variable selection approach. The link function h(a) used 
for the scale parameter is indicated in parenthesis and the optimal score in each column is indicated in bold. 



Covariate(s) 


None 


ff 


ff 


ff 


ff 


VS 








ffVar 


ffVar 


fiVar 












rMax 


rMax 














pfi only 


















CRPS (m/s) 


GEV-MLE (id) 


2.48 


1.07 


0.86 


0.82 


0.80 




GEV-MLE (log) 




1.07 


0.86 


0.82 


0.80 




GEV-CRPS (id) 


2.48 


1.07 


0.86 


0.81 


0.79 




GEV-CRPS (log) 




1.07 


0.86 


0.82 


0.80 




GEV-Bayes (log) 


2.49 


1.08 


0.84 


0.81 


0.79 


0.80 


lognormal GLM 


1.37 


1.09 


0.95 


0.92 


0.90 




IGN 


GEV-MLE (id) 


2.85 


2.00 


1.78 


1.73 


1.71 




GEV-MLE (log) 




2.00 


1.78 


1.73 


1.72 




GEV-CRPS (id) 


2.85 


2.00 


1.78 


1.73 


1.72 




GEV-CRPS (log) 




2.00 


1.78 


1.74 


1.73 




GEV-Bayes (log) 


2.85 


2.00 


1.78 


1.73 


1.71 


1.71 


lognormal GLM 


2.20 


2.03 


1.90 


1.86 


1.85 





location parameter ji and the scale parameter a in ( fT3| ). In addition, pressure (P) and pressure ten- 
dency (dP) also have high posterior inclusion probabilities for the location parameter ji. Note that 
the inclusion probabilities are calculated after the burn-in period has been removed. Especially for 
the low inclusion probabilities, the chains show a high degree of mixing in the burn-in period. Al- 
though seasonality may be captured by the existing covariates, we have further investigated adding 
annual cycles to the model. The results indicate that a small improvement in predictive perfor- 
mance (i.e. of the order of 0.02) may be obtained by including an annual cycle in the location 
parameter. Consequently, only a small fraction of seasonality is left unexplained when no annual 
cycle is concidered explicitly. For clarity of exposition, we omit this factor in the following. 

Table [2] shows the average CRPS and ignorance score over all days in the test set for all the 
prediction methods considered in this study and different sets of covariates for each method. To 
calculate the CRPS in Table [2j we have applied ([9]) and (10) for the GEV models while for the 
Bayesian method, we have applied the approximation in ([3]). Since the logarithm of the gust factor 
log(G) follows a normal distribution, the CRPS of the lognormal GLM is calculated using the 
closed-form expression in |Gneiting et al. (2005 1. To calculate the ignorance score for the Bayesian 
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method, we obtain a non-parametric density estimate for the predictive density based on a large 
sample from the posterior predictive distribution. The ignorance score for the lognormal GLM 
reads S/gaK/a/ - , log(G9) = —\og(fj^(\og(G))/(y — ff)dy), where y is the respective gust value 
and fx is the predictive normal distribution of log(G). Note that the scores are calculated on cross- 
validated predictions (Sec. |3.4[ ), in order to provide an out-of-sample prediction and verification. 
Further, we estimate the sampling uncertainty of the CRPS and ignorance score via the bootstrap 



method (Efr on and Tibshirani[ 1 1993] ). To that end, we recalculated the scores by resampling the 
prediction-observation pairs using the bootstrap method with replacement. The standard deviations 
of the CRPS and ignorance score within a 1000 member bootstrap sample vary between 0.01 and 
0.03 for all the GEV methods. 

Including covariates in the prediction improves the predictive performance significantly com- 
pared to using a stationary model. The decrease in CRPS amounts to about 1.4 (1.7) when includ- 
ing ff (ff, ffVar, rMax, P, dP). With respect to a sampling uncertainty of about 0.03 this decrease 
is highly significant. For the models compared here, all method show the highest predictive skill 
when only covariates with high posterior inclusion probability are included. Including P and dP 
in the scale parameter and rr in both the location and the scale parameter, or a subset of these, 
does not provide significant improvements. Although the lognormal distribution seems to provide 
a better fit to the data in the stationary case, the non- stationary GEV models clearly outperform the 
lognormal GLM except if only ff is used as covariate. The improvement of the CRPS amounts to 
about 0.1 which is well above the sampling uncertainty of the CRPS. The differences between the 
various GEV approaches are more subtle and not very significant. For the frequentist methods, the 
identity link for the scale parameter yields slightly higher skill than the logarithmic link, especially 
in terms of the ignorance score for the best non-stationary model. For the more complex models, 
with three covariates or more, the optimum CRPS methods performs minimally better than the 
maximum likelihood methods in terms of the CRPS and the other way around for the ignorance 
score. The best scores are generally obtained within the Bayesian framework. 

3.5 Predictive performance 

Based on the results in the previous section, we now focus on the non- stationary GEV with location 
parameter 

(i(x) = y,Q + fiix ff + l^2Xffy ar + /^rMax + ^ X P + ^ x dP ( 18 ) 
and scale parameter 

h(a(x.)) = cr + &iXff+ a i x ffVar + a 3 x rMax> ( 19 ) 

where h is the identity link for the optimum score estimation and the logarithm for the Bayesian 
method. Although the differences are barely significant in table[2} there is evidence that the identity 
link model provides slightly better skill compared to the logarithm when using optimum score 
estimation. However, the following results also apply to the logarithm link model. For comparison, 
we also investigate the performance of the lognormal GLM method. Verification is then based on 
a variety of diagnoses. 

The calibration, or the reliability, of the predictions is assessed by using residual quantile plots 
based on the standard Gumbel distribution, see Figure |5J The first column of Figure [5] shows the 
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Figure 5 : Residual quantile plots on Gumbel scale for models with the location parameter in ( 1 8 1 and the 
scale parameter in ( [T9] >. The prediction methods compared here are the GEV-MLE method with identity 
link (1st row), the GEV-CRPS method with identity link (2nd row), the Bayes method (3rd row), and the 
lognormal GLM (4th row). The first column shows the QQ-plots over all forecasts, the second column 
shows forecasts with Xff > (7.75, and the third column shows forecasts with Xff < g.75. The vertical lines 
indicate (from left to right) the residual 0.5, 0.9, 0.95, and 0.99 quantiles. 



residual quantile plots (section 2.1 ) for the complete set of observations. For the lowest 99% of 
the residuals, the bulk of the residuals lies within the 95% uncertainty band of the residual distri- 
bution for all three GEV methods. However, the models show a significant underestimation of the 
highest 1% of the residuals. The underestimation is even more prominent for the logarithmic GLM 
where significant underestimation is observed for the highest 5% of the residuals. An additional 
stratification is based on the covariate Xff, where predictions with below normal mean wind speed 
(xff < q.75) and above normal mean wind speed (xff > g 75 ) are considered separately, where q, 75 
is the 0.75 quantile of the Xff observations. In situations where the mean wind xg-is above its 
0.75 quantile, all models provide calibrated predictive distributions (Fig. [5J second column). The 
uncalibrated large residuals only occur during weak wind situations (Fig. [5} third column). Since 
strong gusts are very unlikely during these weather situations, this miscalibration is probably not 
highly relevant for the gust prediction. 

The three GEV methods in Figure [5] thus all appear to be fairly well calibrated and we can 
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Table 3: Average Brier score at three thresholds and quantile score at four quantiles of peak wind speed 
predictions at Valkenburg from January 1, 2001 to July 1, 2009. The GEV methods apply the location 



parameter in ( 18 ) and the scale parameter in ( 19 1. The link function for the scale parameter is indicated in 
parenthesis. The best performance in each column is indicated in bold. 



Brier Score (1/100) Quantile Score (1/10 m/s) 





14m/s 


18m/s 


25m/s 


0.75 


0.9 


0.95 


0.99 


GEV-MLE (id) 


6.19 


3.27 


0.41 


4.86 


2.97 


1.88 


0.57 


GEV-CRPS (id) 


6.18 


3.21 


0.39 


4.81 


2.95 


1.87 


0.56 


GEV-Bayes (log) 


6.16 


3.24 


0.43 


4.87 


2.98 


1.87 


0.58 


lognormal GLM 


6.55 


3.70 


0.66 


6.73 


3.47 


2.20 


0.69 



Table 4: Average skill scores (in percentages) of peak wind speed predictions at Valkenburg from January 



1, 2001 to July 1, 2009. The GEV methods apply the location parameter in ( 18 ) and the scale parameter in 



( 19 1. The link function for the scale parameter is indicated in parenthesis. The skill scores for the thresholds 
is based on the Brier score and the skill scores for the quantiles is based on the quantile score. The reference 
forecast is the stationary GEV-MLE method. The best performance in each column is indicated in bold. 



Threshold 



Quantile 





14m/s 


18m/s 


25m/s 


0.75 


0.9 


0.95 


0.99 


GEV-MLE (id) 


68.8 


64.0 


52.5 


68.7 


68.2 


66.7 


63.7 


GEV-CRPS (id) 


68.9 


64.7 


54.6 


69.0 


68.4 


67.0 


64.6 


GEV-Bayes (log) 


69.0 


64.4 


50.4 


68.7 


68.1 


67.0 


63.3 


lognormal GLM 


67.0 


59.4 


23.7 


56.7 


62.3 


61.1 


56.3 



compare the sharpness of the predictive distributions in concurrence with the forecasting principle 
of |Gneiting et al. (2007). The GEV-CRPS method yields the sharpest predictions with an average 
predictive standard deviation of 1.43 (±0.44), while this value is 1.48 (±0.57) for the Bayesian 
method and 1.51 (±0.48) for the GEV-MLE method. The values given in the parentheses are the 
standard deviations of the daily predictive standard deviations. The large day-to-day variation 
in the sharpness is to be expected as different levels of uncertainty are associated with different 
prediction scenarios. In general, the forecasts are more uncertain for higher expected peak wind 
speeds. An alternative approach to assess the sharpness is to calculate the average width of sym- 
metric prediction intervals, see e.g. |Gneiting et al.| p005) and |Raftery et al. (2005). Similarly, the 
average coverage of the prediction intervals may be used to assess calibration. 

Table [3] shows the Brier scores at three high thresholds and the quantile scores for four high 
quantiles for our four methods. Wind speeds of 14 — 18m/s are generally considered as near gale, 
wind speeds of 18 — 25m/s are defined as gale and strong gale, while storm values are 25m/s and 
above. The corresponding skill scores are shown in Table [4] where the reference forecast is the 
stationary GEV-MLE method. The standard error for the Brier score is (0.31, 0.25, 0.09) for the 
thresholds (14, 18, 25) and the standard error for the quantile score is about 0.10 for each quantile. 
These scores measure predictive performance at certain points of the predictive distribution or the 
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quantile function, and may indicate local deficiencies. The lognormal GLM provides significantly 
less skill for all thresholds and probabilities, except for the 0.99 quantile where the difference is 
no longer significant. All methods based on a non-stationary GEV distribution perform similarly, 
with the GEV-CRPS methods showing marginally better performance than the other two methods. 

Note that the scores in Table [3] cannot be compared across columns as these are based on dif- 
ferent subsets of the data. The apparent improvement in the scores for higher thresholds/quantiles 
is merely an effect of the decreasing data set on which the values are based. This is evident if the 
results of Table [3] are compared to the results on Table |4} As the same reference forecast is used 
for all the columns in Table |4} we can compare the results across columns relative to the reference 
forecast. Here, we see that the performance of all the methods decreases somewhat with higher 
thresholds/quantiles compared to the stationary GEV-MLE reference method. 

Alternatively, weighted proper scoring rules may be applied to focus on areas of special interest 
in the predictive distribution. Diks et al. ( 201 1[ ) consider weighted versions of the ignorance score 
while Gneiting and Ranjan ( 201 lj ) discuss approaches to weight the CRPS. The weight function 
can here either be based on quantiles using the representation in ([6]) or it can be based on thresholds 
using the representation in ([T]). Note that the weight functions must be defined independent of the 
observed value to uphold propriety (|Gneiting and Ranjan] 12011]). 



3.6 Parameter estimation 

Verification in terms of scores and residual quantile plots provides average performance of the 
prediction method. Another aspect is the variance and covariances of the parameter estimates. 
Bayesian prediction provides estimates of the posterior distribution of the parameters, whereas op- 
timum score estimation only gives approximate covariances based on the profile score function (i.e. 
profile likelihood ( Coles [ 200 1[ ) in maximum likelihood estimation). An indication of uncertainty 



in the parameter estimation is provided by the variability of the estimates over the different train- 
ing data set used in cross-validation. In the cross-validation procedure we successively remove one 
year of data, which provides us with nine estimates of each parameter for each method. 

Figure [6] shows the parameter estimates for the yearly shape parameter £ under the three GEV 
methods discussed in the previous section. The 90% posterior intervals obtained with the Bayesian 
method are fairly constant between years, while there is a large variation in the estimated values 
under both GEV-MLE and GEV-CRPS. Furthermore, the standard errors are significantly greater 
for the GEV-CRPS method. This is in accordance with Fig. [T] where the minimum of the CRPS is 
less sensitive to changes in £ and the discussion related to Fig. |4j The MLE and CRPS optimum 
score estimates of £ are uncorrected which suggests that the variability in the estimates is not 
due to interannual changes in the data. The majority of the estimates are significantly negative, 
indicating that the non- stationar y GE V models are of Weibull type, in contrast to the goodness of 

where £ is only slightly negative. 

ocation coefficients in ( [T8j ) are shown in Figure |7} For clarity 



fit analysis performed in Section 



3.1 



The parameter estimates for the 
of the presentation and as the 90% posterior intervals are very similar for each run of the Bayesian 
method, we only show the 90% posterior interval of the joint posterior sample from all years. In 
general, there is no apparent correlation between the maximum likelihood estimates and the opti- 
mum CRPS estimates. Furthermore, the estimates for both methods are quite variable for different 
training sets and this variability cannot be explained solely by the uncertainty in the estimation 
which is of similar magnitude for both methods and across training sets. All three methods display 
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Figure 6: The estimated yearly shape parameters for the GEV methods using the location parameter in ( 18 1 



and the scale parameter in (19). The gray boxes indicate the 90% posterior intervals for for £ under the 



Bayesian model, the black dots (•) indicate the parameter estimates under the GEV-CRPS(id) model, and 
the squares (□) indicate the parameter estimates under the GEV-MLE(id) model. The vertical lines indicate 
the respective standard errors. For comparison, the shape parameter of the observations estimated via MLE, 
£ = —0.013, is indicated with a dotted line. 



the largest uncertainty in the estimate for the intercept coefficient. All methods show a strong pos- 
itive dependence of the GEV location parameter on ff and ffVar, a weaker dependence on rMax, 
and a negative dependence on pressure p and pressure tendency dP. 



The Bayesian posterior distributions for the scale coefficients in ( 19) cannot be directly com- 
pared to the corresponding estimates under the frequentist approaches as we have applied the iden- 
tity link for the frequentist approaches while we use a logarithmic link for the Bayesian method. 
As for the location coefficients, no apparent correlation can be found between the MLE coefficient 
estimates and the minimum CRPS estimates and the variability between training sets is generally 
greater than expected based on the estimation uncertainty alone. Like the location parameter, the 
non- stationary GEV scale parameter positively depends on ff, ffVar, and rMax. Hence, predictive 
uncertainty increases for large expected peak wind speed. 



4 Discussion 



As e.g. Dawid ( 1984) and Gneiting (2008) have argued before us, predictions for uncertain events 



ought to include an estimate of the associated uncertainty. This is easily obtained within a proba- 
bilistic framework where the predictions take the form of probability distributions. Proper scoring 
rules such as the Brier score or the CRPS are widely used to assess the predictive skill in proba- 
bilistic weather forecasting, see e.g. Wilks ( 201 In this paper, we discuss how proper scoring 
rules may be used for out-of-sample model selection and verification for extreme value distribu- 
tions. We present a closed-form expressions of the CRPS for this class of distributions. With 
these expressions at hand, the CRPS may be used for optimum score estimation as an alternative 
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Figure 7: Optimum score estimates of yearly location parameters for the GEV methods using the location 
parameter in ( p~8] ) and the scale parameter in ( fl~9| ). The dots represent the maximum likelihood estimates (x- 
axis) and the optimum CRPS estimates (y-axis) for the different years. The gray lines indicate the respective 
standard errors. The gray bars on each axis indicate the 90% posterior intervals for the parameters under the 
Bayesian model. 



to maximum likelihood estimation or Bayesian inference. 

In a case study, we compare various approaches to derive non- stationary distributions for daily 
peak wind speed observations. The non-stationarity is built in by including covariates, i.e. me- 
teorological parameters that are observed together with the gust measurement. Our competing 
methodologies comprise a lognormal GLM for the wind gust factor and non-stationary GEV mod- 
els for the daily wind gusts. The non- stationary GEV approaches provide significantly higher skill 
than the corresponding lognormal GLM. This confirms findings in Friederichs et al. ( 2009| ) that ex- 
treme value theory provides an appropriate and theoretically consistent statistical model for wind 
gusts. 

Due to the close relationship between wind speed and peak wind, mean wind speed is the main 
covariate in our model. However, the predictive performance is significantly improved by also 
including information on additional weather variables. The most informative covariates besides 
mean wind speed are the variability of the mean wind speed, the maximum rain rate, and the 
pressure tendency throughout the day. With six potential covariates for both the location and 
the scale parameter, our model space includes a total of 2 12 models. This large space can easily be 
searched using a Bayesian variable selection procedure which returns different covariate sets for the 
location and for the scale. Overall, the robustness and the predictive performance of the Bayesian 
framework is very good though it should be noted that the Bayesian inference is considerably 
slower than the frequentist methods. 



Gneiting et al. (2005 ) compare minimum CRPS estimation and maximum likelihood estimation 
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in a prediction framework and show that the former returns forecasts with slightly better overall 
predictive performance and significantly improved calibration. Their results confirm well with our 
findings in that the optimum CRPS estimation outperforms the maximum likelihood approach in 
nearly all aspects of our verification even though the differences are sometimes small. A drawback 
of the minimum CRPS estimation is its weak discriminant power with respect to the GEV shape 
parameter. 

In our case study, we have performed an extensive analysis of the properties of the various 
scoring rules and associated inference methods for the GEV distribution. The GEV is the self- 
evident distribution in the case of block maxima, i.e. in the case of peak wind speed. In the case 
of threshold excesses, a peak-over-threshold (POT) or Poisson point process approach would be 
more appropriate. We assume that similar results will hold for the POT approach using a GPD 
distribution and the closed-form expression given in this paper. Since the Poisson point process 
is generally represented in terms of GEV parameters, optimum score estimation using the CRPS 
should be possible as well. 
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Appendix A: Derivation of the CRPS for a GEV 

For the deviation of the CRPS under the GEV, we apply the representation in ([6]), where the CRPS 
is presented as an integral over the quantile score in ([5]). We can write the quantile score as 

S%(F, y)=r[y- F-\r)} - l{r > F(y)} [y - F'^r)] 

from which it is easily seen that 

S CR p(F,y) =y[2F(y) - l] - 2 / rF^^dr + 2 [ F~ l 

JO JF(y) 

The quantile function of the GEV is given by 

{ \i- o-log(-logr) , £ = 
For a non-zero shape parameter £ ^ and with F = F G ev^ q , we obtain 

rl pi 

2 TF~ 1 {r)dT = n-- + 2 r(- log r) _€ dr 



V)dr. 



(20) 



(21) 



and 



2 / F- l (r)dr = 2 

>F(y) 



a 



[1 - F(y 



a 



2- I (-logr) "dr. 

5 7F(y) 



(22) 



(23) 



To solve the two remaining integrals in ( [22] ) and ( [23] ), we use that 



and 



J r(- log r)"«dr = 2^(1 -2 logr) 
y (- log T)- f dT = r u (l-£,- logr), 



where r u (a,r) = J"°°t a x e *dt is the upper incomplete gamma function. By combining (20), 
d22b, ([23]), and the integral equations above, we get 



ScRp(FGEV £ -io,y) 



y-v+ 



[2 J P G£V5/0 (2/)-l]-|[2^r(l-£)-2r i (l-e,-logF G ^ ( 2 /)) 



where Ti(a, r) = T(a) — T u (a, r) is the lower incomplete gamma function. 
For a Gumbel type GEV with £ = 0, similar calculations show that 

SoRp(F G EV i=0 , y) =[V ~ lA [2F G EV (=0 (y) - 1] + 2cr / r log(- logr)d? 

</o 

2ct / log(— logr) c/r. 



(24) 



24 



The first integral is solved as 



r log(— log r)dr = - r 2 log(— log r) — Ei{ "2 log r ) 



where Ei(x) = f* jdt is the exponential integral also given by 



Ei{x) =C + log{\x\) + J2 



OO u 

X 



k=l 



k\k 



with C ~ 0.5772 being the Euler-Mascheroni constant. Further, we use 



log(— log r)dr = r log(— log r) — Ei(\og r) 



to solve the second integral in ( |24| ). This leads to a closed-form expression for the CRPS under a 
Gumbel-type GEV with 



ScRp{FGEV (=0 ,y) 

= [y-lA [2F GEV(=0 (y)-i\ 



+ a lim 



v 2 log(— log i/) — Ei{2 log i/) 



lim 



rj 1 log(— log rj) — Ei(2 log 77) 



2<T lim 



z/ log(— log z/) — _Ei(log 1/) 
-(y-fx)- 2a Ei (log F GEV(=0 (y)) 



2a 



+ cr lim 



l) 2 log(-logz/) + C-log2 + ^ 



*W e=0 (y) + Ei ( log F GW?=0 (y)) 

(21ogi/) fc + 2(logz/) fc 



fc=i 



A;!*; 



(y - n) - 2aEi(logF GEV ^ =0 {y)) + a[C - log 2], 



where we use that \im. rj ^_ OQ Ei(rj) = 0. The series expansion to compute Ei(\og F GEV ^ (y)) 
converges rapidly for F GE v i=Q (y) much greater than 0. However, it may fail for very small values 



of F GE v, =0 (y). We use the GNU Scientific Library (Galassi et al. 2010) to compute Ei(- 



Appendix B: Derivation of the CRPS for a GPD 

The derivation of the closed-form expression of the CRPS for a GPD is very similar to the deriva- 
tions for a GEV in Appendix A with significantly simpler calculations. The GPD quantile function 
is given by 

F-i ( P )-S u-f + f(l- P )-t, ^0 
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By applying the representation of the CRPS given in ( [20] ), we get 

cr ?J 



S G Rp(F gpd y) 



y-u + 

y-u + 



[2F GPD ^{y)-l] 



(1 - r)^dr - r(l - r^dr 



- 2 



tit - 1) 



[2F GPD ^ (y)-l] 
1 



e-2 



l — F, 



GPD& 



(y) 



y - u 



Here, the second integral can be solved using integration by parts. For £ = 0, we obtain the 
following expression 

ScRp(F G pD e=0 , y) = (y- u) [2F GPDe=0 (y) - l] 



+ 2a u 



r\og(l — r)dT— / log(l — r)dr 

1 



= y -u-a u 
where we use the integral equation 



2F GPDe=0 (y) 



J log(l — x)dx = x[log(l — x) — l] — log(l 



X) 



and integration by parts. 
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